AITopics | text excerpt

Collaborating Authors

text excerpt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CiteME: Can Language Models Accurately Cite Scientific Claims?

Neural Information Processing SystemsMar-18-2026, 03:18:14 GMT

Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers, which achieves an accuracy of 35.3% on CiteME. Overall, CiteME serves as a challenging testbed for open-ended claim attribution, driving the research community towards a future where any claim made by an LM can be automatically verified and discarded if found to be incorrect.

artificial intelligence, machine learning, natural language, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Natural Language (0.63)

Add feedback

FLAWS: A Benchmark for Error Identification and Localization in Scientific Papers

Xi, Sarina, Rao, Vishisht, Payan, Justin, Shah, Nihar B.

arXiv.org Artificial IntelligenceDec-1-2025

The identification and localization of errors is a core task in peer review, yet the exponential growth of scientific output has made it increasingly difficult for human reviewers to reliably detect errors given the limited pool of experts. Recent advances in Large Language Models (LLMs) have sparked interest in their potential to support such evaluation tasks, from academic peer review to automated scientific assessment. However, despite the growing use of LLMs in review systems, their capabilities to pinpoint errors remain underexplored. In this work, we introduce Fault Localization Across Writing in Science (FLAWS), an automated benchmark consisting of 713 paper-error pairs designed to evaluate how effectively LLMs detect errors that undermine key claims in research papers. We construct the benchmark by systematically inserting claim-invalidating errors into peer-reviewed papers using LLMs, paired with an automated evaluation metric that measures whether models can identify and localize these errors. Developing such a benchmark presents unique challenges that we overcome: ensuring that the inserted errors are well-defined, challenging, and relevant to the content of the paper, avoiding artifacts that would make identification trivial, and designing a scalable, automated evaluation metric. On the resulting benchmark, we evaluate five frontier LLMs: Claude Sonnet 4.5, DeepSeek Reasoner v3.1, Gemini 2.5 Pro, GPT 5, and Grok 4. Among these, GPT 5 is the top-performing model, achieving 39.1% identification accuracy when k=10, where k is the number of top-ranked error text candidates generated by the LLM.

excerpt, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2511.21843

Country:

North America > United States (0.92)
Asia (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CiteME: Can Language Models Accurately Cite Scientific Claims?

Neural Information Processing SystemsOct-9-2025, 18:37:52 GMT

Thousands of new scientific papers are published each month.

arxiv preprint arxiv, citeagent, excerpt, (15 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

CiteME: Can Language Models Accurately Cite Scientific Claims?

Neural Information Processing SystemsMay-26-2025, 16:17:04 GMT

artificial intelligence, machine learning, natural language, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.61)
Information Technology > Artificial Intelligence > Natural Language (0.59)

Add feedback

Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction

McGlinchey, Andrea Cristina, Barclay, Peter J

arXiv.org Artificial IntelligenceDec-15-2024

Following the universal availability of generative AI systems with the release of ChatGPT, automatic detection of deceptive text created by Large Language Models has focused on domains such as academic plagiarism and "fake news". However, generative AI also poses a threat to the livelihood of creative writers, and perhaps to literary culture in general, through reduction in quality of published material. Training a Large Language Model on writers' output to generate "sham books" in a particular style seems to constitute a new form of plagiarism. This problem has been little researched. In this study, we trained Machine Learning classifier models to distinguish short samples of human-written from machine-generated creative fiction, focusing on classic detective novels. Our results show that a Naive Bayes and a Multi-Layer Perceptron classifier achieved a high degree of success (accuracy > 95%), significantly outperforming human judges (accuracy < 55%). This approach worked well with short text samples (around 100 words), which previous research has shown to be difficult to classify. We have deployed an online proof-of-concept classifier tool, AI Detective, as a first step towards developing lightweight and reliable applications for use by editors and publishers, with the aim of protecting the economic and cultural contribution of human authors.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.15253

Country:

Europe > United Kingdom > Scotland (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)
Europe > Finland > Lapland > Rovaniemi (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Media > News (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.57)

Add feedback

Improving Neuron-level Interpretability with White-box Language Models

Bai, Hao, Ma, Yi

arXiv.org Artificial IntelligenceDec-7-2024

Neurons in auto-regressive language models like GPT-2 can be interpreted by analyzing their activation patterns. Recent studies have shown that techniques such as dictionary learning, a form of post-hoc sparse coding, enhance this neuron-level interpretability. In our research, we are driven by the goal to fundamentally improve neural network interpretability by embedding sparse coding directly within the model architecture, rather than applying it as an afterthought. In our study, we introduce a white-box transformer-like architecture named Coding RAte TransformEr (CRATE), explicitly engineered to capture sparse, low-dimensional structures within data distributions. Our comprehensive experiments showcase significant improvements (up to 103% relative improvement) in neuron-level interpretability across a variety of evaluation metrics. Detailed investigations confirm that this enhanced interpretability is steady across different layers irrespective of the model size, underlining CRATE's robust performance in enhancing neural network interpretability. Further analysis shows that CRATE's increased interpretability comes from its enhanced ability to consistently and distinctively activate on relevant tokens. These findings point towards a promising direction for creating white-box foundation models that excel in neuron-level interpretation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.16443

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measuring and Modifying the Readability of English Texts with GPT-4

Trott, Sean, Rivière, Pamela D.

arXiv.org Artificial IntelligenceOct-17-2024

The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.

computational linguistic, excerpt, readability, (14 more...)

arXiv.org Artificial Intelligence

2410.14028

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.14)
North America > Canada > Ontario > Toronto (0.05)
(8 more...)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CiteME: Can Language Models Accurately Cite Scientific Claims?

Press, Ori, Hochlehnert, Andreas, Prabhu, Ameya, Udandarao, Vishaal, Press, Ofir, Bethge, Matthias

arXiv.org Artificial IntelligenceJul-10-2024

Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers, which achieves an accuracy of 35.3\% on CiteME. Overall, CiteME serves as a challenging testbed for open-ended claim attribution, driving the research community towards a future where any claim made by an LM can be automatically verified and discarded if found to be incorrect.

arxiv preprint arxiv, citeagent, excerpt, (14 more...)

arXiv.org Artificial Intelligence

2407.12861

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

Add feedback

Answerability in Retrieval-Augmented Open-Domain Question Answering

Abdumalikov, Rustam, Minervini, Pasquale, Kementchedjhieva, Yova

arXiv.org Artificial IntelligenceMar-3-2024

The performance of Open-Domain Question Answering (ODQA) retrieval systems can exhibit sub-optimal behavior, providing text excerpts with varying degrees of irrelevance. Unfortunately, many existing ODQA datasets lack examples specifically targeting the identification of irrelevant text excerpts. Previous attempts to address this gap have relied on a simplistic approach of pairing questions with random text excerpts. This paper aims to investigate the effectiveness of models trained using this randomized strategy, uncovering an important limitation in their ability to generalize to irrelevant text excerpts with high semantic overlap. As a result, we observed a substantial decrease in predictive accuracy, from 98% to 1%. To address this limitation, we discovered an efficient approach for training models to recognize such excerpts. By leveraging unanswerable pairs from the SQuAD 2.0 dataset, our models achieve a nearly perfect (~100%) accuracy when confronted with these challenging text excerpts.

excerpt, text excerpt, unanswerable question, (15 more...)

arXiv.org Artificial Intelligence

2403.01461

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas (0.04)
(2 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

Weng, Luoxuan, Zhu, Minfeng, Wong, Kam Kwai, Liu, Shi, Sun, Jiashun, Zhu, Hang, Han, Dongming, Chen, Wei

arXiv.org Artificial IntelligenceApr-11-2023

Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, 2) the poor generalization performance of existing methods caused by out-of-distribution issues, and 3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts' prior knowledge with machine intelligence, along with a visual analytics prototype to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study with proficient researchers. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2304.05011

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Hong Kong (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.66)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback